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Abstract 

We construct an extension of diffusion geometry to multiple modalities through 
joint approximate diagonalization of Laplacian matrices. This naturally extends 
classical data analysis tools based on spectral geometry, such as diffusion maps 
and spectral clustering. We provide several synthetic and real examples of mani- 
fold learning, retrieval, and clustering demonstrating that the joint diffusion geom- 
etry frequently better captures the inherent structure of multi-modal data. We also 
show that many previous attempts to construct multimodal spectral clustering can 
be seen as particular cases of joint approximate diagonalization of the Laplacians. 



1 Introduction 



The Laplacian operator and related constructions play a pivotal role in a wide range of applications in 
machine learning, pattern recognition, and computer vision community. It has been shown that many 
problems in these fields boil down to finding some eigenvectors and eigenvalues of a Laplacian con- 



structed on some high-dimensional data. Important examples include spectral clustering (NgetaL 
(|2001 |Q where c lusters are determined by the first eigenv ectors of the Laplacian; e igenmapsjBdkin 



& Niyogi] ( |2002| )) and more generally diffusion maps ( [Coifman & Lafon| ( |2006| )), where one tries 



to find a low-dimensional manifold structure using the first smallest eigenvectors of the Laplacian; 
and diffusion metrics ([C oilman et al. ('2005')) measuring the "connectivity" of points on a manifold 
and expressed through the eigenvalues and eigenvectors of the Laplacian. Other applications heavi ly 
relying on the properties of the Laplacian include spectral graph partitioning ( Din g et al.|p001|)), 
spectra l hashing ( [Weiss et aL|(|2008|)), sp ectral correspondence, image segmentation ( |Shi & Malik 
( [1997 )), and shape analysis (Levy ( 2006 )). Because of the intimate relation between the Laplacian 
operator, Riemannian geometry, and diffusion processes, it is common to encounter the umbrella 
term spectral or diffusion geometry in relation to the above problems. 

These applications have been considered mostly in the context of uni-modal data, i.e., a single data 
space. However, many applications involve observations and measurements of data done using dif- 

~|( |20T0|); 



ferent modalities, such as multimedia documents ( Weston et al. 



Rasiwasi a et al.| (|2010| ) ; 



|McFee & Lanckriet| ( [20TTT )), audio and video ( |Kidron et al.|d2005t ;TAlameda-Pine da et al.| ( |2011| )) 
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or medical imaging modalities like PET and CT ( [Bronstein et al.| ( 2010 )). Such problems of mul- 



timodal (or multi-view) data analysis have gained increasing interest in the computer vision and 
pattern recognition communities, however, there have been only few attempts extending the power- 
ful spectral methods to such settings. 

In this paper, we propose a general framework allowing to extend different diffusion and spectral 
methods to the multimodal setting by finding a common eigenbasis of multiple Laplacians. Numeri- 
cally, this problem is posed as approximate joint diagonalization of several mat rices. Such methods 
have received limited attention in the numerical mathematics community (Bunse-Gerstner et al. 



(1993 )) and have been employed for joint diagonalization of covariance matrices in blind source 
separation applications by Cardoso & Soulou mlac] \ 1 993 1 1 1 996) ; |Yeredor| ( [2002] ) ; |Ziehe| ( [2005] ). To 
the best of our knowledge, this is the first time they are applied to spectral embeddings. Besides 
providing a principled approach to data fusion, our framework gives a theoretical explanation to 
existing methods for mul timoda l data analysis. In particul a r, we show that ma n y recent works on 
multi-view clusteri ng by |de Sa| ( [2005] ); |Ma & Lee| ( [2008] ); |Tang et al.| ( [2009] ); |Cai et al.| ([20TTJ; 



Kum ar et al.|p011| ) can be considered a particular instance of our framework. 



2 Background 

Let us be given some data represented as a /c-dimensional manifold X C R d , embedded into a tri- 
dimensional Euclidean space. In many applications d is very large while the intrinsic dimension of 
the data k is small, and one tries to study the structure of the manifold rather than its d-dimensional 
embedding. Such a structure can be characterized by the means of the Laplace -Beltrami opera- 
tor. In the discrete setting, the manifold is often represented by a weighted graph with vertices 
{xi, . . . ,x n } C X and edge weights = ^(x^Xj) representing local connectivity using e.g. 
Gaussian kernel (see von Luxbur g (2007)). The Laplace-Beltrami operator can be discretizecf] as 



L = D" 1/2 (D - W)D~ 1/2 , where W = (w^) and D = diagQ^. w^). Such a discretiza- 
tion is often referred to as symmetric normalized Laplacian and admits a unitary diagonalization 
L = VAV T , VV T = I n with the eigenvalues Ai = < A2 < . . . < A n . Geometric constructions 
associated with eigenvectors and eigenvalues of the Laplacian play an important role in machine 
learning, since several archetypical problems can be formulated in these terms: 

Eigenmaps. Non-linear dimensionality reduction methods try to capture the intrinsic low- 
dimensional structure of the manifold X. Belkin and Niyogi (2002) showed that finding a 
neighborhood-preserving /c-dimensional embedding of X can be posed as the minimum eigenvalue 
problem, 

min tr (V T LV) s.t. V T V = I. (1) 

VGl nXfe 

This problem is minimized by setting V to be the matrix containing the first k eigenvectors of L, 
thus effectively embedding the data by means of the eigenf unctions of the Laplace-Beltrami operator 
(the null eigenvector is usually discarded). Such an embedding is referred to as Laplacian eigenmap. 
More generally, a diffusion map is given as a mapping of the form \I/ = (K ( A 2 ) v 2 , . . . , if (A/^Vfe 



where K{\) is some transfer function acting as a "low-pass filter" on eigenvalues A (Coifman et al. 
(2005]); [Coifman & Lafon| ( |2006l )). 



Diffusion distances. Coifman et al. ( 2005[ [2006] ) related the eigenmaps to heat diffusion and 



random processes on manifolds and defined a family of diffusion metrics that in the most general 
setting can be written as 

d 2 (x i)Xi ) = Y,K{\i){v u - Vjl f = ||*(xi) - *( Xj )||l. (2) 

I 

Particular choice of if (A) = e~ xt gives the heat diffusion distance, related to the connectivity of 
points x^ , Xj on the manifold by means of diffusion process of length t. Such distances are intrinsic 
and thus invariant to manifold embedding and are robust to topological noise. 



Spectral clustering. Ng et al.| ( [200 1| ) showed a very efficient and robust clustering approach 



based on the observation that the multiplicity of the null eigenvalue of L is equal to the number 



1 There exist many different constructions of the discrete Laplacian. For the sake of simplicity, we adopt the 
symmetric Laplacian. Our framework is applicable to other discretization as well. 
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Figure 1: First and second rows: eigenfunctions of the Laplacians of two modalities of the Swiss 
roll. Third and fourth rows: joint eigenfunctions of the two Laplacians computed using JADE. Hot 
colors represent positive values; cold colors represent negative values. 

of connected components of X. The corresponding eigenvectors act as indicator functions of these 
components. Embedding the data using these eigenvectors and then applying some standard cluster- 
ing algorithm such as K-means was shown to produce significantly better results than clustering the 
high-dimensional data directly. 

3 Multimodal diffusion geometry 

Recently, we witness increasing popularity of attempts to analyze different "views" or modalities 
of data. Such data can be modeled as m different manifolds X 1 C R dl , . . . , X m C R dm , which 
can have embeddings of different dimensionality (di, . . . , d m ) and sometimes different structure. 
We are interested in analyzing these manifolds simultaneously in order to extract their joint intrinsic 
structure. We assume that we are given n corresponding samples {(x|, . . . , xjj C R di }^L t on the 
manifolds and can construct the Laplacian matrices Li , . . . , L m as described in the previous section. 

Trying to use the eigenvectors Vi, . . . , V m of the Laplacian matrices Li, . . . , L m is problematic: 
for a set of eigenvectors corresponding to an eigenvalue with multiplicity greater than one, we can 
talk only of eigen sub-space, and any basis spanning it is a valid set of eigenvectors. As a result, the 
eigenvectors of the Laplacians in different modalities can be substantially different (Figure [T] top). 

Joint diagonalization. A solution is to try to find the eigenbasis of the Laplacians simultaneously. 
This problem is known as joint diagonalization and consists of finding a set of joint orthogonal 

- t 

eigenvectors V such that V L^V = A$ are diagonal matrices of the eigenvalues of L$. Such a 
common eigenbasis solves the inherent ambiguity in the definition of the eigenvectors and "couples" 
different modalities (Figure [T] bottom). However, due to differences between the modalities and 
the presence of noise, the Laplacian matrices Li, . . . , L m rarely have a joint eigenbasis (iff they 
commute). It is still possible to find an approximate joint diagonalization by solving 



where off(X) is some off-diagonality criterion, e.g. the sum of squared off-diagonal elements, 

off(X) = || X — diag(X)||p. In this case, V T L^V are only approximately diagonal; we refer 

to the average of the diagonal elements A = ^ Y^iLx diag(V L^V) as the joint approximate 
eigenvalues of Li, . . . , L m . This definition allows us to naturally extend the diffusion geometric 



m 




(3) 
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methods discussed in the previous section (eigenmaps, diffusion distances, spectral clustering, etc.) 
to the multimodal setting by simply replacing the eigenvalues and eigenvectors of a single Laplacian 
Li by the joint eigenvectors V and eigenvalues A of multiple Laplacians Li, . . . , L m . 

Numerical computation. A numerical method for joint diagonalization based on a modified Jacobi 



iteration traces back to Buns e-Gerstner et aL] ([T993), and it has been used at about the same time by 
Cardoso and Souloumiac (1993 ; 1996) for joint diagonalization of covariance matrices in the context 
of blind source separation. The idea of the standard Jacobi method for eigenvalue calculation is to 
apply a sequence of plane rotations in order to sequentially minimize the off-diagonal elements of 
the given matrix. The rotation is applied "in-place" and does not require matrix multiplication. In the 
modified Jacobi method (referred to as JADE), the rotations are applied to reduce the off-diagonality 
criterion (15]) in each step. Let H pqcs the (complex) rotation matrix the entries of which are equal to 
those of the identity matrix except for the elements 



r pp r pq \ c s | 

Van Tan \ S C 1 



where | c \ 2 + | s | — 1. Cardoso & Souloumiac|( 1996) show that the problem 



ll2 ™ X ^" 1 ^ (5) 

c \ 2 + \s \ 2 =1 f—* 
1=1 

has a simple explicit solution based on a 3 x 3 eigenvalue problem. JADE is one of the most common 
algorithms in the field of joint diagonalization and has complexity comparable to that of the standard 
Jacobi method. There are other algorithms, like the ACDC method of |Yeredor (2002|), as well as 



different versions of the idea of minimizing a suitable cost function on the Stiefel manifold ( |Rahbar| 
|& ReillylpQOOl )). 

Analytic computation. In the spectral clustering problem, we are looking for the null eigenvectors 
of the Laplacian. Assuming that the first k eigenvalues of the Laplacians are zero, we want to find 

V £ R nxk such that L;V = for alH = 1, . . . , m and V T V = I by reformulating ^ ; 



i as 



V ^ 1 1 T \r\\2 „ ±. A7" T 1 



min > LiV p, s.t. V V = I. (6) 

i=i 

Since YlT=i II-^^IIf = ^ r (V T I^L^V), the problem can be equivalently recast as single- 
modality clustering with the "average" Laplacian matrix L = Y^Li ^-H^i- We can also consider 
other averaging operators, e.g. weighted arithmetic mean L = Y^T=i Wj ^ J ^ or h arm °rri c mean L = 
(Y^i=i -L^ 1 ) -1 - We discuss these methods in the next section. 

For zero eigenvalues, ([6]) is akin to which justifi es the successful u se of such "averaging" meth- 
ods in problems of multimodal spectral clustering (Ma & Lee ( 2008 ); |Cai et al.| ( [2011 )). However, 



iterative methods such as JADE explicitly minimizing the off-diagonality criterion (3]) are more 
generic and applicable to settings where one has to find all or many joint eigenvectors, e.g., for 
computing eigenmaps or diffusion distances. 



4 Relation to previous works 

There have been numerous recent works on multimodal spectral-type clustering proposing differ- 
ent ways of fusing multiple modalities based on different principles. Considering these methods 
through the prism of joint diagonalization, we show many commonalities and equivalences between 



algorithms stemming from different motivations and coming from various communities. Ma & Lee 
(2008) considered detection of shots in video sequences using fusion of video and audio information, 
employing for this purpose s pectral clustering of a Laplacian created as a weighted arithmetic mean 
of each modality Laplacian. |Tang et aL] ( [2009] ) used low-rank factorization of the weight matrix, 

\ « UAiU T by solving 

m 

^||W,-UA,U T |||, (7) 



trying to find a common factor U such that « UAiU T by solving 



mm 

UGM nXfe ,AiGff 
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using the quasi-Newton method. Besides the fact that the factorization is applied to the weight matrix 
(it can be equivalently applied to the Laplacian), we see h ere a (non-orth ogonal) joint diagonalization 
problem with an off-diagonality criterion considered by | Yeredor] p002| ) . 



Cai et al. (2011 ) proposed a method for multiview spectral clustering (MVSC) by solving 



mm 

V i ,VGM riX 



fc ^tr(V?L,V,)+a||V, 



V||| 



s.t. 



V T V = I 



The authors show that this problem can be equivalently posed as 

max tr (V T V™ (L< + al) -1 v) s.t. V T V = I, 



(8) 



(9) 



and then employ an iterative algorithm to find the solution V. First, we observe that problem ^ 
consists of m minimum-eigenvalue problems w.r.t. bases V^, with the addition of a coupling term, 
encouraging as close as possible to some common basis V (note that the authors do not impose 
orthogonality constraints V^V = I, but for a ^> 0, the proximity to orthogonal V makes V$ 
approximately orthogonal). Thus, it is possible to interpret ^ as a kind of joint diagonalization 
criterion. Second, problem d9l) can be rewritten as a minimum eigenvalue problem 



mm tr 

ver xfe 



V T V : 



(10) 



whose solution is given by the matrix composed of the first k eigenvectors of the matrix 
(Y^iLi fai + • For a > 0, this a regularized version of the harmonic mean of the Lapla 



cian matrices. We can thus regard the method of Cai et al. (201 1 ) as a particular instance of our joint 
diagonalization approach discussed in the previous section. 



Kumar et al.| ( |2011| ) proposed the centroid co-regularization approach for multimodal clustering 
based on the minimization of 



V,ViG 



min V tr (V^L; V;) - air (ViVjW T ) s.t. V* 



V T V 



(11) 



This function is alternatingly minimized, first with respect to the V$, then with respect to V. Prob- 
lems ( 11 ) and ^ are similar in their spirit (the first one uses dissimilarity || — V||| as coupling 
term, while the second one the similarity tr (ViVjW T ) = ||VjV||p)), and fall under our joint 
diagonalization framework. 

We must stress that these methods were developed for clustering problems where one has to find the 
null eigenvectors, and do not adapt easily to other applications of diffusion geometry where one has 
to find many or all joint eigenvectors of the Laplacians (e.g., computation of diffusion distanc es). 
In particular, iterative solvers used in Tang et al. (2009); Ku mar et al. ( |2011 ); Cai et al. ( 2011| ) do 
not scale up to such cases. On the other hand, algorithms such as modified Jacobi iteration (JADE) 
are made for finding a full set of joint eigenvectors and have the complexity akin to standard Jacobi 
iteration. Further speed-up might be achieved by making explicit use of the sparse structure of the 
Laplacian matrices, which is not taken advantage of in JADE. 



5 Results 

We tested the proposed approach on three applications: dimensionality reduction, diffusion distance, 
and spectral clustering. All the datasets and code generating the results in this section are available 
from anonymous . com. Additional results are shown in the supplementary material. 

Swiss rolls. In the first experiment, we used two Swiss roll surfaces with slightly different embed- 
ding as two different data modalities. The rolls were constructed in such a way that in each modality 



' Cai et al.| ( |201 1| > also impose a non-negativity constraint on the matrix V in order to obtain cluster indicators 
directly and bypass the K-means clustering stage. We ignore this additional constraint for the simplicity of 
discussion; such a constraint can be added to all the problems discussed in this paper. 
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Figure 2: Flattening the Swiss rolls: dimensionality reduction using unimodal (left, center) and 
multimodal (right) eigenmaps. Joint eigenvectors were computed using JADE. 




Modality 1 Modality 2 Multimodal 



Figure 3: Diffusion distances from the blue point to the rest of the points on the Swiss roll surfaces. 
Darker colors represent smaller distances. First and third columns show the connectivity used in the 
construction of the Laplacians. Joint eigenvectors were computed using JADE. 



there is topological noise (connectivity "across" the roll loops) at different points. Laplacians were 
constructed as in Belkin & Niyogi (2002) using 5-neighbor connectivity and Gaussian weights with 
scale parameter t. Figure [T] shows the first few eigenvectors computed using each Laplacian indi- 
vidually and jointly. Figure [2] shows two-dimensional embeddings of the same surfaces using the 
first non-trivial eigenvectors. When using joint eigenvectors, we are able to correctly capture the 
intrinsic structure of the data. Figure [3] shows the diffusion distance on the Swiss roll surfaces, com- 
puted using the first 100 eigenvectors and heat diffusion kernel K(X) = e _1000A . Topological noise 
is clearly visible especially in the first modality, resulting in the distance between two loops to be 
small. This phenomenon does not occur when using joint eigenvectors. 

Synthetic data clustering. In the second experiment, we performed clustering on several synthetic 
multimodal datasets. Laplacians were constructed using 15 n earest neighbors (10 for the ci rcles), 
and Gaussian weight selected using the self-tuning approach of Perona & Zelnik-Manor ( 2004). We 



compare spectral clustering based o n single modalities (SC-1 and S C-2) and joint diagonalization 
obtained us ing the JADE m ethod of |Cardoso & So uloumiacl( [l996| ); harmonic mean (JD-HM) of 
Laplacians ( |Cai et al. ( 2011| )); and a non- spectral Comraf clustering algorithm (Bekker man & Jeon 



2007|)) . Qual ity was measured using the clustering accuracy criterion as defined in Bekkerman 



|& Jeon| ( [2007| ). For Blobs, accuracy is averaged over 100 experiments ran on randomly generated 



datasets. 



The results are summarized in Figure [4] and Table [T] Surprisingly, the simple-minded averaging 
approach performs extremely well; this is consistent with the previously reported results and the 
success of the methods of |Cai et"aL| ( [201 1 > (essentially harmonic mean) and |Ma & Lee (2008]) 



(arithmetic mean). 





Clus. 


SC-1 


SC-2 


JADE 


JD-HM 


Comraf 


Blobs 


6 


91.0±7.2% 


90.8±7.2% 


97.3±4.2% 


98.3±3.0% 


86.9±8.6% 


Circles 


4 


65.9% 


63.4% 


100.0% 


99.8% 


31.4% 


NIPS 


4 


63.3% 


75.1% 


99.9% 


99.9% 


51.8% 


NUS 


7 


83.5% 


71.0% 


92.4% 


80.7% 


82.1% 




7 


73.3% 


76.2% 


86.7% 


84.8% 




Caltech 


20 


66.3% 


70.7% 


73.3% 


76.0% 





Table 1: Accuracy of different clustering methods. 



NUS d ataset. In the third experiment, we used a subset of the NUS -WIDE dataset Chu a et al. 
(2009) containing annotated images. The images were selected on purpose to have ambiguous con- 




Modality 1 Modality 2 Joint 

Figure 4: Clustering synthetics datasets. Marker size represents ground truth; marker color repre- 
sents segmentation results (ideally, markers of each type should have a single color). 



tent and annotations (e.g., swimming tigers are also tagged as "water" making them confuse e.g. 
with whales). As two different modalities, we used the 64-dimensional color histograms and 1000- 
dimensional bags of words. Laplacians were constructed using 10 nearest neighbors and Gaussian 
weight was selected using self-tuning. Table [T] shows the performance of different clustering meth- 
ods, and Figure [5] exemplifies the clustered images. 

Using JADE joint diagonalization, we produced all the joint eigenvectors of the two modalities 
Laplacians. Figure [5] (top) shows the distance matrices between the objects in the NUS dataset 
obtained usinguni- and multi-modal diffusion distances (computed with the first 100 eigenvectors 
according to Q using heat diffusion kernel K(X) = e _5A ). Ideally, the distance matrix should 
contain zero blocks on the diagonal (objects of the same class) and non-zero elsewhere (objects 
from different classes). Thresholding these distances at a set of levels and measuring the false posi- 
tives/true positive rates (FPR/TPR), we produce the ROC curves that clearly indicate the advantage 
of using multiple modalities (Figure [8]). 

In Figure [7] (top), we used the diffusion distance to progressively sample the NUS dataset using the 
farthest point sampling strategy: starting with some point, pick up the second one as most distant 
from the first; then the third as the most distant from the first and second, and so on. Such sampling 
is almost-optimal ( Hochbaum & Shmoys] ( |1985| )) and is known to produce a progressively refined 
r-covering of the set. In fact, the first 7 samples produced in this way cover all the classes present 
in the dataset, which is an indication of the meaningfulness of such a sampling. 

Caltech dataset. In the fourth experiment, we repeat ed the thi rd experiment on a subset of the 
Caltech-101 dataset with 7 and 20 image classes as in Cai et al. ( |2011 ). For each image, kernels 



arising from different visual descriptors were given. For the 7 -clusters experiment, we used the 
bio-inspired features and 4x4 pyramid histogram of visual words (PHOW); for the 20-clusters ex- 
periment, we used geometric blur and 4x4 PHOW descriptors as different modalities, respectively. 
Laplacians were constructed from these kernels using Gaussian weight selected with self-tuning. 
Diffusion distances were computed with the first 100 eigenvectors using the kernel K(X) = e _5A 
The results are shown in Figures [~~ 



6 Discussion and Conclusions 

We presented a framework for multi-modal data analysis using approximate joint diagonalization of 
Laplacian matrices, naturally extending the classical construction of diffusion geometry to the multi- 
modal setting. This construction allowed an almost straightforward extension of various diffusion- 
geometric data analysis tools such as spectral clustering and manifold learning based on diffusion 
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Figure 5: Spectral clustering of NUS dataset. Shown are a few images and corresponding tags 
belonging to the same cluster obtained using Tags (top row), Color histogram (second row), and 
joint modalities (third to fifth row). Groundtruth clusters are shown in different colors. 




Figure 6: Spectral clustering of CaltechlOl dataset. Shown are a few images and corresponding 
tags belonging to the same cluster obtained using ht_bio_105034 bio-inspired features (top row), 4x4 
PHOW (second row), and joint modalities (third and fourth row). Groundtruth clusters are shown in 
different colors. 



maps. In follow-up studies, we intend to show multi-modal extensions of other related techniques 
such as spectral hashing. 

We also showed that many previously proposed approaches to multi-modal spectral clustering are 
nearly equivalent and try to solve some version of the joint approximate diagonalization problem. 
From the numerical perspective, existing methods were tailored for computing the null joint eigen- 
vectors that are sought for in clustering problems. The underlying optimization problems are poorly 
suited for broader applications of diffusion geometry such as non-linear dimensionality reduction 
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0.076955 0.056509 0.041308 0.029242 0.022934 0.022272 0.020757 0.020374 0.019405 0.018488 






0.037303 0.028773 0.028185 0.026317 0.019344 0.019092 0.018469 0.017507 0.0171 0.015677 

Figure 7: Farthest point sampling of NUS (top) and Caltech (bottom) datasets using joint diffusion 
distance. First point is on the left. Numbers indicate the sampling radius. Note that in both cases, 
the first 7 samples cover all the image classes. 






Figure 8: Columns one to three: distance matrices, column four: ROC curves, computed on NUS 
(top) and Caltech (bottom) datasets using joint diffusion distance. Ambiguities are shown in white. 



and manifold learning, where many or all eigenvectors of the Laplacians are of interest. While ap- 
proximate joint diagonalization methods developed in the signal processing community for source 
separation problems can address the latter case, they were initially developed for full matrices and 
do not take advantage of the sparse structure of Laplacians. 

To the best of our knowledge, there currently exists no efficient tool to compute the joint eigenvectors 
of very large sparse matrices, akin Matlab's eigs. We believe that the presented construction 
makes the need of such a tool central enough to deserve the interest of the entire machine learning 
community. In future work, we will consider extending standard methods for eigendecomposition 
of large sparse matrices to the joint diagonalization case. 
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